523 research outputs found

    XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

    Get PDF
    BACKGROUND: Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. DESCRIPTION: Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. CONCLUSION: The results of the analysis have been stored in a publicly available database XenDB . A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at

    #Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds

    Full text link
    Compounding of natural language units is a very common phenomena. In this paper, we show, for the first time, that Twitter hashtags which, could be considered as correlates of such linguistic units, undergo compounding. We identify reasons for this compounding and propose a prediction model that can identify with 77.07% accuracy if a pair of hashtags compounding in the near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed hashtag compounds can be recommended early, even before the compounding has taken place. Further, humans can predict compounds with an overall accuracy of only 48.7% (treated as baseline). Notably, while humans can discriminate the relatively easier cases, the automatic framework is successful in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016

    Convergence towards a European strategic culture? A constructivist framework for explaining changing norms.

    Get PDF
    The article contributes to the debate about the emergence of a European strategic culture to underpin a European Security and Defence Policy. Noting both conceptual and empirical weaknesses in the literature, the article disaggregates the concept of strategic culture and focuses on four types of norms concerning the means and ends for the use of force. The study argues that national strategic cultures are less resistant to change than commonly thought and that they have been subject to three types of learning pressures since 1989: changing threat perceptions, institutional socialization, and mediatized crisis learning. The combined effect of these mechanisms would be a process of convergence with regard to strategic norms prevalent in current EU countries. If the outlined hypotheses can be substantiated by further research the implications for ESDP are positive, especially if the EU acts cautiously in those cases which involve norms that are not yet sufficiently shared across countries

    Native Speaker Perceptions of Accented Speech: The English Pronunciation of Macedonian EFL Learners

    Get PDF
    The paper reports on the results of a study that aimed to describe the vocalic and consonantal features of the English pronunciation of Macedonian EFL learners as perceived by native speakers of English and to find out whether native speakers who speak different standard variants of English perceive the same segments as non-native. A specially designed computer web application was employed to gather two types of data: a) quantitative (frequency of segment variables and global foreign accent ratings on a 5-point scale), and b) qualitative (open-ended questions). The result analysis points out to three most frequent markers of foreign accent in the English speech of Macedonian EFL learners: final obstruent devoicing, vowel shortening and substitution of English dental fricatives with Macedonian dental plosives. It also reflects additional phonetic aspects poorly explained in the available reference literature such as allophonic distributional differences between the two languages and intonational mismatch

    PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction

    Get PDF
    A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu

    Fast index based algorithms and software for matching position specific scoring matrices

    Get PDF
    BACKGROUND: In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. RESULTS: We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330. CONCLUSION: Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than | [Formula: see text] |(m )+ m - 1, where m is the length of the PSSM and [Formula: see text] a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript

    Targeting a Versatile Actuator for EU-DEMO: Real Time Monitoring of Pellet Delivery to Facilitate Burn Control

    Get PDF
    Core particle fueling, an essential task in the European demonstration fusion power plant EU-DEMO, relies on adequate pellet injection. However, pellets are fragile objects, and their delivery efficiency can hardly be assumed to be unity. Exploring kinetic control of the EU-DEMO1 scenario indicates that such missed-out pellets do cause a considerable problem for keeping a burning plasma. Missed-out pellets can cause a severe drop of plasma density that in turn results in a potential drastic loss of burn power. Efforts are under way at the ASDEX Upgrade (AUG) tokamak aiming to provide real-time monitoring of pellet arrival and announcement of missed-out cases to the control systems. To further optimize the controllers, system identification experiments have been performed to identify the dynamic response of the system to the actuator
    corecore